| Team | Points | Assists | Rebounds |
|---|---|---|---|
| A | 88 | 12 | 22 |
| B | 91 | 17 | 28 |
| C | 99 | 24 | 30 |
| D | 94 | 28 | 31 |
Today we will…
Artwork by Allison Horst
Which data follows a tidy data format?
Artwork by Allison Horst
.csv : “Comma-separated”
Name, Age
Bob, 49
Joe, 40
.xls, .xlsx: Microsoft Excel Spreadsheet - Common approach: save as .csv - Nicer approach: readxl package
.txt: Plain text - Could be just text - Could be comma-separated data - Could be tab-separated, bar-separated, etc. - Need to let R know what to look for
Let’s Look at Age_Data directory from this week’s Check-In for a better understanding of these file types.
The tidyverse has some cleaned-up versions in the readr and readxl packages:
read_csv() works like read.csv, with some extra stuff
read_tsv() is for tab-separated data
read_table() is for any data with “columns” (white space separating)
read_delim() is for special “delimiters” separating data
read_excel() is specifically for dealing with Excel files
Let’s:
Create a new directory called learn_by_doing
Load the nba_data.xlsx and get acquainted with the data.
Use here::here() to load the data into R.
APIs provide structured, real-time data access, allowing custom queries and often requiring authentication. They are essential in modern data science for accessing updated and specific data sets.
:::
spotifyr package, directly interfacing with Spotify’s Web API.tidyquant package, integrating with financial markets data.tidycensus package, simplifying the use of US Census Bureau data.:::
Think of a data visualization or graph as a mapping
GoG components, as specified in R’s ggplot2
dataaes : aesthetic mappings (position, length, color, symbol, …)geom : geometric element (point, line, bar, …)stat : statistical variable transformation (identity, count, linear model, quantile, …)scale : scale transformation (log scale, color mapping, axes tick breaks, …)coord : Cartesian, polar, map projection, …facet : divide into subplots / small multiples using a categorical variableOf course, we can also control axes, legends, titles … (guides)
ggplot2In ggplot2, we map variables from the data set to aesthetics on the chart
Not an exhaustive list – see ggplot2 cheat sheet
Global Aesthetics
Local Aesthetics
In ggplot2, we use a geom function to represent data points, and use the geom’s aesthetic properties to represent variables.
Not an exhaustive list – see ggplot2 cheat sheet
one variable
geom_density()geom_dotplot()geom_histogram()geom_boxplot()two variable
geom_point()geom_line()geom_density_2d()three variable
geom_contour()geom_raster()Once our data is formatted and we know what type of variables we are working with, we can select the correct geom for our visualization.
A stat builds a new variable to plot (e.g., count and proportion)
A way to extract subsets of data and place them side-by-side in graphics
Note
sometimes called small multiples
facet_grid(. ~ b): facet into columns based on bfacet_grid(a ~ .): facet into rows based on afacet_grid(a ~ b): facet into both rows and columnsfacet_wrap( ~ fl): wrap facets into a rectangular layoutYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = "free"): x and y axis limits adjust to individual facets
You can also set a labeller to adjust facet labels:
facet_grid(. ~ fl, labeller = label_both)facet_grid(. ~ fl, labeller = label_bquote(alpha ^ .(x)))facet_grid(. ~ fl, labeller = label_parsed)Position adjustments determine how to arrange geoms that would otherwise occupy the same space
position = 'dodge': Arrange elements side by sideposition = 'fill': Stack elements on top of one another, normalize heightposition = 'stack': Stack elements on top of one anotherposition = 'jitter": Add random noise to X & Y position of each element to avoid overplotting (see geom_jitter())Clearer labels with labs()
Tip
Notice how there is a lot of nesting that happens within ggplot2 code (e.g., parentheses within parentheses). It is good practice to put each geom and aesthetic on a new line. This makes code easier to read!
The general guideline is that each line of your code should not be over 80 characters long.
Artwork by Allison Horst
Tip
I encourage you to use your neighbors for support!
Note
I have office hours TODAY, Tuesday (1/17) from 2:40pm - 3:30pm in 25-103
Today we will…
What makes bad figures bad?
Edward R. Tufte is a better known critic of this style of visualization:
bad data.
Looking at pictures of data means looking at lines, shapes, and colors
Our visual system works in a way that makes some things easier for us to see than others
Graphics consist of:
Structure: boxplot, scatterplot, etc.
Aesthetics: features such as color, shape, and size that map other characteristics to structural features
Both the structure and aesthetics should help viewers interpret the information.
What sorts of relationships are inferred, and under what circumstances?
| Gestalt Hierarchy | Graphs |
|---|---|
| Enclosure | Facets |
| Connection | Lines |
| Proximitiy | White Space |
| Similarity | Color/Shape |
Implications for practice
Pre-Attentive Features are things that “jump out” in less than 250 ms
There is a hierarchy of features
Hue: shade of color (red, orange, yellow…)
Intensity: amount of color
Both color and hue are pre-attentive. Bigger contrast corresponds to faster detection.
Use color to your advantage
When choosing color schemes, we will want mappings from data to color that are not just numerically but also perceptually uniform
Distinguish between sequential scales and categorical scales
No more than 7 colors
Can use colorRampPalette() from the RColorBrewer package to produce larger palettes by interpolating existing ones
Use color gradient with only one hue for positive values
Use color gradient with two hues for positive and negative values. Gradient should go through a light, neutral color (white)
There are packages available for use that have color scheme options.
Some Examples:
There are packages such as RColorBrewer and dichromat that have color palettes which are aesthetically pleasing, and, in many cases, colorblind friendly.
You can also take a look at other ways to find nice color palettes.